In recent years, there is a growing number of pre-trained models trained on a large corpus of data and yielding good performance on various tasks such as classifying multimodal datasets. These models have shown good performance on natural images but are not fully explored for scarce abstract concepts in images. In this work, we introduce an image/text-based dataset called Greeting Cards. Dataset (GCD) that has abstract visual concepts. In our work, we propose to aggregate features from pretrained images and text embeddings to learn abstract visual concepts from GCD. This allows us to learn the text-modified image features, which combine complementary and redundant information from the multi-modal data streams into a single, meaningful feature. Secondly, the captions for the GCD dataset are computed with the pretrained CLIP-based image captioning model. Finally, we also demonstrate that the proposed the dataset is also useful for generating greeting card images using pre-trained text-to-image generation model.
translated by 谷歌翻译
压缩高准确性卷积神经网络(CNN)的最新进展已经见证了实时对象检测的显着进步。为了加速检测速度,轻质检测器总是使用单路主链几乎没有卷积层。但是,单路径架构涉及连续的合并和下采样操作,始终导致粗糙和不准确的特征图,这些图形不利,无法找到对象。另一方面,由于网络容量有限,最近的轻质网络在表示大规模的视觉数据方面通常很弱。为了解决这些问题,本文提出了一个名为DPNET的双路径网络,并采用了实时对象检测的轻巧注意方案。双路径体系结构使我们能够与提取物相对于高级语义特征和低级对象详细信息。尽管DPNET相对于单路检测器几乎具有重复的形状,但计算成本和模型大小并未显着增加。为了增强表示能力,轻巧的自相关模块(LSCM)旨在捕获全局交互,只有很少的计算开销和网络参数。在颈部,LSCM扩展到轻质互相关模块(LCCM),从而捕获相邻尺度特征之间的相互依赖性。我们已经对Coco和Pascal VOC 2007数据集进行了详尽的实验。实验结果表明,DPNET在检测准确性和实施效率之间实现了最新的权衡。具体而言,DPNET在MS COCO Test-DEV上可实现30.5%的AP,Pascal VOC 2007测试集上的81.5%地图,MWITH近250万型号,1.04 GFLOPS,1.04 GFLOPS和164 fps和196 fps和196 fps,320 x 320输入图像的320 x 320输入图像。
translated by 谷歌翻译
由于对图像细节和语义进行了强大的能力,近年来提出了许多轻量级的双分辨率网络。但是,大多数人都忽略了边界信息的好处。本文介绍了一种轻量级的双分辨率网络,称为Drbanet,旨在通过边界信息来细化语义分割结果。 Dbanet采用双行架构,包括:高分辨率分支(HRB)和低分辨率分支(LRB)。具体而言,HRB主要由一组有效的反转瓶颈模块(EIBMS)组成,其学习具有更大接收领域的特征表示。 LRB由一系列EIBM和极轻的金字塔池模块(ELPPM)组成,其中通过分层残差连接使用ELPPM来捕获多尺度上下文。最后,旨在捕获HRB中的对象边界的边界监督头。 CITYSCAPES和CAMVID数据集的广泛实验表明,我们的方法在分割准确性和运行效率之间实现了有前途的权衡。
translated by 谷歌翻译
对象检测通常需要相当数量的计算来获得满意的性能,这是不友好的部署在边缘设备中。为了解决计算成本和检测准确性之间的权衡,本文介绍了一个名为DPNet的双路径网络,用于轻量级自我关注的高效对象检测。在骨干中,单个输入/输出轻量级自我关注模块(LSAM)旨在编码不同位置之间的全局相互作用。LSAM还扩展到功能金字塔网络(FPN)中的多输入版本,该版本用于捕获两条路径中的跨分辨率依赖性。Coco DataSet的广泛实验表明,我们的方法实现了最先进的检测结果。更具体地说,DPNET在COCO测试开发中获得29.0%AP,仅为320x320图像的1.14 GFLOPS和2.27M型号大小。
translated by 谷歌翻译
Modeling lies at the core of both the financial and the insurance industry for a wide variety of tasks. The rise and development of machine learning and deep learning models have created many opportunities to improve our modeling toolbox. Breakthroughs in these fields often come with the requirement of large amounts of data. Such large datasets are often not publicly available in finance and insurance, mainly due to privacy and ethics concerns. This lack of data is currently one of the main hurdles in developing better models. One possible option to alleviating this issue is generative modeling. Generative models are capable of simulating fake but realistic-looking data, also referred to as synthetic data, that can be shared more freely. Generative Adversarial Networks (GANs) is such a model that increases our capacity to fit very high-dimensional distributions of data. While research on GANs is an active topic in fields like computer vision, they have found limited adoption within the human sciences, like economics and insurance. Reason for this is that in these fields, most questions are inherently about identification of causal effects, while to this day neural networks, which are at the center of the GAN framework, focus mostly on high-dimensional correlations. In this paper we study the causal preservation capabilities of GANs and whether the produced synthetic data can reliably be used to answer causal questions. This is done by performing causal analyses on the synthetic data, produced by a GAN, with increasingly more lenient assumptions. We consider the cross-sectional case, the time series case and the case with a complete structural model. It is shown that in the simple cross-sectional scenario where correlation equals causation the GAN preserves causality, but that challenges arise for more advanced analyses.
translated by 谷歌翻译
Deep learning models are known to put the privacy of their training data at risk, which poses challenges for their safe and ethical release to the public. Differentially private stochastic gradient descent is the de facto standard for training neural networks without leaking sensitive information about the training data. However, applying it to models for graph-structured data poses a novel challenge: unlike with i.i.d. data, sensitive information about a node in a graph cannot only leak through its gradients, but also through the gradients of all nodes within a larger neighborhood. In practice, this limits privacy-preserving deep learning on graphs to very shallow graph neural networks. We propose to solve this issue by training graph neural networks on disjoint subgraphs of a given training graph. We develop three random-walk-based methods for generating such disjoint subgraphs and perform a careful analysis of the data-generating distributions to provide strong privacy guarantees. Through extensive experiments, we show that our method greatly outperforms the state-of-the-art baseline on three large graphs, and matches or outperforms it on four smaller ones.
translated by 谷歌翻译
Data-driven models such as neural networks are being applied more and more to safety-critical applications, such as the modeling and control of cyber-physical systems. Despite the flexibility of the approach, there are still concerns about the safety of these models in this context, as well as the need for large amounts of potentially expensive data. In particular, when long-term predictions are needed or frequent measurements are not available, the open-loop stability of the model becomes important. However, it is difficult to make such guarantees for complex black-box models such as neural networks, and prior work has shown that model stability is indeed an issue. In this work, we consider an aluminum extraction process where measurements of the internal state of the reactor are time-consuming and expensive. We model the process using neural networks and investigate the role of including skip connections in the network architecture as well as using l1 regularization to induce sparse connection weights. We demonstrate that these measures can greatly improve both the accuracy and the stability of the models for datasets of varying sizes.
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
Explainable AI transforms opaque decision strategies of ML models into explanations that are interpretable by the user, for example, identifying the contribution of each input feature to the prediction at hand. Such explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by finding relevant subspaces in activation space that can be mapped to more abstract human-understandable concepts and enable a joint attribution on concepts and input features. To automatically extract the desired representation, we propose new subspace analysis formulations that extend the principle of PCA and subspace analysis to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), optimize relevance of projected activations rather than the more traditional variance or kurtosis. This enables a much stronger focus on subspaces that are truly relevant for the prediction and the explanation, in particular, ignoring activations or concepts to which the prediction model is invariant. Our approach is general enough to work alongside common attribution techniques such as Shapley Value, Integrated Gradients, or LRP. Our proposed methods show to be practically useful and compare favorably to the state of the art as demonstrated on benchmarks and three use cases.
translated by 谷歌翻译
Cybercriminals are moving towards zero-day attacks affecting resource-constrained devices such as single-board computers (SBC). Assuming that perfect security is unrealistic, Moving Target Defense (MTD) is a promising approach to mitigate attacks by dynamically altering target attack surfaces. Still, selecting suitable MTD techniques for zero-day attacks is an open challenge. Reinforcement Learning (RL) could be an effective approach to optimize the MTD selection through trial and error, but the literature fails when i) evaluating the performance of RL and MTD solutions in real-world scenarios, ii) studying whether behavioral fingerprinting is suitable for representing SBC's states, and iii) calculating the consumption of resources in SBC. To improve these limitations, the work at hand proposes an online RL-based framework to learn the correct MTD mechanisms mitigating heterogeneous zero-day attacks in SBC. The framework considers behavioral fingerprinting to represent SBCs' states and RL to learn MTD techniques that mitigate each malicious state. It has been deployed on a real IoT crowdsensing scenario with a Raspberry Pi acting as a spectrum sensor. More in detail, the Raspberry Pi has been infected with different samples of command and control malware, rootkits, and ransomware to later select between four existing MTD techniques. A set of experiments demonstrated the suitability of the framework to learn proper MTD techniques mitigating all attacks (except a harmfulness rootkit) while consuming <1 MB of storage and utilizing <55% CPU and <80% RAM.
translated by 谷歌翻译